Level 2: Restaurant Data Analytics | Data Analysis¶

  • Task 1: Restaurant Ratings

  • Task 2: Cuisine Combination

  • Task 3: Geographic Analysis

  • Task 4: Restaurant Chains

Step 1: Import necessary Python libraries.¶

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from tabulate import tabulate
import itertools
import plotly.express as px
from sklearn.cluster import KMeans

Step 2: Load the dataset into a DataFrame.¶

In [2]:
# Read the csv file using pandas read_csv
resto_df = pd.read_csv(r"Dataset .csv")
resto_df
Out[2]:
Restaurant ID Restaurant Name Country Code City Address Locality Locality Verbose Longitude Latitude Cuisines ... Currency Has Table booking Has Online delivery Is delivering now Switch to order menu Price range Aggregate rating Rating color Rating text Votes
0 6317637 Le Petit Souffle 162 Makati City Third Floor, Century City Mall, Kalayaan Avenu... Century City Mall, Poblacion, Makati City Century City Mall, Poblacion, Makati City, Mak... 121.027535 14.565443 French, Japanese, Desserts ... Botswana Pula(P) Yes No No No 3 4.8 Dark Green Excellent 314
1 6304287 Izakaya Kikufuji 162 Makati City Little Tokyo, 2277 Chino Roces Avenue, Legaspi... Little Tokyo, Legaspi Village, Makati City Little Tokyo, Legaspi Village, Makati City, Ma... 121.014101 14.553708 Japanese ... Botswana Pula(P) Yes No No No 3 4.5 Dark Green Excellent 591
2 6300002 Heat - Edsa Shangri-La 162 Mandaluyong City Edsa Shangri-La, 1 Garden Way, Ortigas, Mandal... Edsa Shangri-La, Ortigas, Mandaluyong City Edsa Shangri-La, Ortigas, Mandaluyong City, Ma... 121.056831 14.581404 Seafood, Asian, Filipino, Indian ... Botswana Pula(P) Yes No No No 4 4.4 Green Very Good 270
3 6318506 Ooma 162 Mandaluyong City Third Floor, Mega Fashion Hall, SM Megamall, O... SM Megamall, Ortigas, Mandaluyong City SM Megamall, Ortigas, Mandaluyong City, Mandal... 121.056475 14.585318 Japanese, Sushi ... Botswana Pula(P) No No No No 4 4.9 Dark Green Excellent 365
4 6314302 Sambo Kojin 162 Mandaluyong City Third Floor, Mega Atrium, SM Megamall, Ortigas... SM Megamall, Ortigas, Mandaluyong City SM Megamall, Ortigas, Mandaluyong City, Mandal... 121.057508 14.584450 Japanese, Korean ... Botswana Pula(P) Yes No No No 4 4.8 Dark Green Excellent 229
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9546 5915730 Naml۱ Gurme 208 ��stanbul Kemanke�� Karamustafa Pa��a Mahallesi, R۱ht۱m ... Karak�_y Karak�_y, ��stanbul 28.977392 41.022793 Turkish ... Turkish Lira(TL) No No No No 3 4.1 Green Very Good 788
9547 5908749 Ceviz A��ac۱ 208 ��stanbul Ko��uyolu Mahallesi, Muhittin ��st�_nda�� Cadd... Ko��uyolu Ko��uyolu, ��stanbul 29.041297 41.009847 World Cuisine, Patisserie, Cafe ... Turkish Lira(TL) No No No No 3 4.2 Green Very Good 1034
9548 5915807 Huqqa 208 ��stanbul Kuru�_e��me Mahallesi, Muallim Naci Caddesi, N... Kuru�_e��me Kuru�_e��me, ��stanbul 29.034640 41.055817 Italian, World Cuisine ... Turkish Lira(TL) No No No No 4 3.7 Yellow Good 661
9549 5916112 A���k Kahve 208 ��stanbul Kuru�_e��me Mahallesi, Muallim Naci Caddesi, N... Kuru�_e��me Kuru�_e��me, ��stanbul 29.036019 41.057979 Restaurant Cafe ... Turkish Lira(TL) No No No No 4 4.0 Green Very Good 901
9550 5927402 Walter's Coffee Roastery 208 ��stanbul Cafea��a Mahallesi, Bademalt۱ Sokak, No 21/B, ... Moda Moda, ��stanbul 29.026016 40.984776 Cafe ... Turkish Lira(TL) No No No No 2 4.0 Green Very Good 591

9551 rows × 21 columns

Step 3: Basic Inspection on given dataset¶

  • Top 5 rows - using head
In [3]:
resto_df.head()
Out[3]:
Restaurant ID Restaurant Name Country Code City Address Locality Locality Verbose Longitude Latitude Cuisines ... Currency Has Table booking Has Online delivery Is delivering now Switch to order menu Price range Aggregate rating Rating color Rating text Votes
0 6317637 Le Petit Souffle 162 Makati City Third Floor, Century City Mall, Kalayaan Avenu... Century City Mall, Poblacion, Makati City Century City Mall, Poblacion, Makati City, Mak... 121.027535 14.565443 French, Japanese, Desserts ... Botswana Pula(P) Yes No No No 3 4.8 Dark Green Excellent 314
1 6304287 Izakaya Kikufuji 162 Makati City Little Tokyo, 2277 Chino Roces Avenue, Legaspi... Little Tokyo, Legaspi Village, Makati City Little Tokyo, Legaspi Village, Makati City, Ma... 121.014101 14.553708 Japanese ... Botswana Pula(P) Yes No No No 3 4.5 Dark Green Excellent 591
2 6300002 Heat - Edsa Shangri-La 162 Mandaluyong City Edsa Shangri-La, 1 Garden Way, Ortigas, Mandal... Edsa Shangri-La, Ortigas, Mandaluyong City Edsa Shangri-La, Ortigas, Mandaluyong City, Ma... 121.056831 14.581404 Seafood, Asian, Filipino, Indian ... Botswana Pula(P) Yes No No No 4 4.4 Green Very Good 270
3 6318506 Ooma 162 Mandaluyong City Third Floor, Mega Fashion Hall, SM Megamall, O... SM Megamall, Ortigas, Mandaluyong City SM Megamall, Ortigas, Mandaluyong City, Mandal... 121.056475 14.585318 Japanese, Sushi ... Botswana Pula(P) No No No No 4 4.9 Dark Green Excellent 365
4 6314302 Sambo Kojin 162 Mandaluyong City Third Floor, Mega Atrium, SM Megamall, Ortigas... SM Megamall, Ortigas, Mandaluyong City SM Megamall, Ortigas, Mandaluyong City, Mandal... 121.057508 14.584450 Japanese, Korean ... Botswana Pula(P) Yes No No No 4 4.8 Dark Green Excellent 229

5 rows × 21 columns

  • bottom 5 rows using tail
In [4]:
resto_df.tail()
Out[4]:
Restaurant ID Restaurant Name Country Code City Address Locality Locality Verbose Longitude Latitude Cuisines ... Currency Has Table booking Has Online delivery Is delivering now Switch to order menu Price range Aggregate rating Rating color Rating text Votes
9546 5915730 Naml۱ Gurme 208 ��stanbul Kemanke�� Karamustafa Pa��a Mahallesi, R۱ht۱m ... Karak�_y Karak�_y, ��stanbul 28.977392 41.022793 Turkish ... Turkish Lira(TL) No No No No 3 4.1 Green Very Good 788
9547 5908749 Ceviz A��ac۱ 208 ��stanbul Ko��uyolu Mahallesi, Muhittin ��st�_nda�� Cadd... Ko��uyolu Ko��uyolu, ��stanbul 29.041297 41.009847 World Cuisine, Patisserie, Cafe ... Turkish Lira(TL) No No No No 3 4.2 Green Very Good 1034
9548 5915807 Huqqa 208 ��stanbul Kuru�_e��me Mahallesi, Muallim Naci Caddesi, N... Kuru�_e��me Kuru�_e��me, ��stanbul 29.034640 41.055817 Italian, World Cuisine ... Turkish Lira(TL) No No No No 4 3.7 Yellow Good 661
9549 5916112 A���k Kahve 208 ��stanbul Kuru�_e��me Mahallesi, Muallim Naci Caddesi, N... Kuru�_e��me Kuru�_e��me, ��stanbul 29.036019 41.057979 Restaurant Cafe ... Turkish Lira(TL) No No No No 4 4.0 Green Very Good 901
9550 5927402 Walter's Coffee Roastery 208 ��stanbul Cafea��a Mahallesi, Bademalt۱ Sokak, No 21/B, ... Moda Moda, ��stanbul 29.026016 40.984776 Cafe ... Turkish Lira(TL) No No No No 2 4.0 Green Very Good 591

5 rows × 21 columns

  • Inspecting Column Names and Data Types
In [5]:
resto_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9551 entries, 0 to 9550
Data columns (total 21 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Restaurant ID         9551 non-null   int64  
 1   Restaurant Name       9551 non-null   object 
 2   Country Code          9551 non-null   int64  
 3   City                  9551 non-null   object 
 4   Address               9551 non-null   object 
 5   Locality              9551 non-null   object 
 6   Locality Verbose      9551 non-null   object 
 7   Longitude             9551 non-null   float64
 8   Latitude              9551 non-null   float64
 9   Cuisines              9542 non-null   object 
 10  Average Cost for two  9551 non-null   int64  
 11  Currency              9551 non-null   object 
 12  Has Table booking     9551 non-null   object 
 13  Has Online delivery   9551 non-null   object 
 14  Is delivering now     9551 non-null   object 
 15  Switch to order menu  9551 non-null   object 
 16  Price range           9551 non-null   int64  
 17  Aggregate rating      9551 non-null   float64
 18  Rating color          9551 non-null   object 
 19  Rating text           9551 non-null   object 
 20  Votes                 9551 non-null   int64  
dtypes: float64(3), int64(5), object(13)
memory usage: 1.5+ MB
  • Checking for Missing Values
In [6]:
resto_df.isnull().sum()
Out[6]:
Restaurant ID           0
Restaurant Name         0
Country Code            0
City                    0
Address                 0
Locality                0
Locality Verbose        0
Longitude               0
Latitude                0
Cuisines                9
Average Cost for two    0
Currency                0
Has Table booking       0
Has Online delivery     0
Is delivering now       0
Switch to order menu    0
Price range             0
Aggregate rating        0
Rating color            0
Rating text             0
Votes                   0
dtype: int64
In [7]:
cuisines = resto_df['Cuisines'].dropna().str.split(", ").explode() 
  • Basic Statistical Summary
In [8]:
resto_df.describe()
Out[8]:
Restaurant ID Country Code Longitude Latitude Average Cost for two Price range Aggregate rating Votes
count 9.551000e+03 9551.000000 9551.000000 9551.000000 9551.000000 9551.000000 9551.000000 9551.000000
mean 9.051128e+06 18.365616 64.126574 25.854381 1199.210763 1.804837 2.666370 156.909748
std 8.791521e+06 56.750546 41.467058 11.007935 16121.183073 0.905609 1.516378 430.169145
min 5.300000e+01 1.000000 -157.948486 -41.330428 0.000000 1.000000 0.000000 0.000000
25% 3.019625e+05 1.000000 77.081343 28.478713 250.000000 1.000000 2.500000 5.000000
50% 6.004089e+06 1.000000 77.191964 28.570469 400.000000 2.000000 3.200000 31.000000
75% 1.835229e+07 1.000000 77.282006 28.642758 700.000000 2.000000 3.700000 131.000000
max 1.850065e+07 216.000000 174.832089 55.976980 800000.000000 4.000000 4.900000 10934.000000
  • Checking Unique Values
In [9]:
resto_df.nunique()
Out[9]:
Restaurant ID           9551
Restaurant Name         7446
Country Code              15
City                     141
Address                 8918
Locality                1208
Locality Verbose        1265
Longitude               8120
Latitude                8677
Cuisines                1825
Average Cost for two     140
Currency                  12
Has Table booking          2
Has Online delivery        2
Is delivering now          2
Switch to order menu       1
Price range                4
Aggregate rating          33
Rating color               6
Rating text                6
Votes                   1012
dtype: int64
  • Checking Shape
In [10]:
resto_df.shape
Out[10]:
(9551, 21)

Task 1: Restaurant Ratings¶

  • **Analyze the distribution of aggregate

ratings and determine the most common rating range.**

In [11]:
agg_val_count = resto_df['Aggregate rating'].value_counts()
agg_val_count
Out[11]:
Aggregate rating
0.0    2148
3.2     522
3.1     519
3.4     498
3.3     483
3.5     480
3.0     468
3.6     458
3.7     427
3.8     400
2.9     381
3.9     335
2.8     315
4.1     274
4.0     266
2.7     250
4.2     221
2.6     191
4.3     174
4.4     144
2.5     110
4.5      95
2.4      87
4.6      78
4.9      61
2.3      47
4.7      42
2.2      27
4.8      25
2.1      15
2.0       7
1.9       2
1.8       1
Name: count, dtype: int64
In [24]:
rating_most_common = agg_val_count.idxmax()
print(f'The Most Common rating range is: {rating_most_common}')
The Most Common rating range is: 0.0
In [12]:
plt.figure(figsize=(10, 6))
plt.hist(resto_df['Aggregate rating'], bins=30, color='#fca311', edgecolor='#14213d')
plt.xlabel('Aggregate Rating')
plt.ylabel('Count')
plt.title('Distribution of Aggregate Rating')
plt.show()
No description has been provided for this image
  • **Calculate the average number of votes

received by restaurants.**

In [13]:
avg_vote = round(resto_df['Votes'].mean(), 2)
print(f'The Average number of Votes received by restaurants : {avg_vote}')
The Average number of Votes received by restaurants : 156.91

Task 2: Cuisine Combination¶

  • **Identify the most common combinations of

cuisines in the dataset.**

In [14]:
common_cuisines_combinations = resto_df.groupby('Cuisines')['Aggregate rating'].mean().sort_values(ascending=False)
top_10_combinations = common_cuisines_combinations.head(10)
print(f'The Top 10 most common combinations are : {top_10_combinations}')
The Top 10 most common combinations are : Cuisines
Italian, Deli               4.9
Hawaiian, Seafood           4.9
American, Sandwich, Tea     4.9
Continental, Indian         4.9
European, Asian, Indian     4.9
European, Contemporary      4.9
European, German            4.9
BBQ, Breakfast, Southern    4.9
American, Coffee and Tea    4.9
Sunda, Indonesian           4.9
Name: Aggregate rating, dtype: float64
  • **Determine if certain cuisine combinations

tend to have higher ratings.**

In [15]:
max_rating = common_cuisines_combinations.iloc[0]
print(f'The Max Rating is: {max_rating}')
The Max Rating is: 4.9
In [16]:
max_rated_rest = resto_df.loc[resto_df['Aggregate rating'] == max_rating]
print('Restorents having the Maximum Ratings: ')
max_rated_rest['Restaurant Name']
Restorents having the Maximum Ratings: 
Out[16]:
3                                           Ooma
8       Spiral - Sofitel Philippine Plaza Manila
10                              Silantro Fil-Mex
39                                    Coco Bambu
48                            Braseiro da G��vea
                          ...                   
9484              Restaurant Mosaic @ The Orient
9514                            Ministry of Crab
9524                                Gaga Manjero
9538                                   Starbucks
9540                            Draft Gastro Pub
Name: Restaurant Name, Length: 61, dtype: object

Task 3: Geographic Analysis¶

  • **Plot the locations of restaurants on a

map using longitude and latitude coordinates.**

In [17]:
resto_df[['Latitude', 'Longitude']]
Out[17]:
Latitude Longitude
0 14.565443 121.027535
1 14.553708 121.014101
2 14.581404 121.056831
3 14.585318 121.056475
4 14.584450 121.057508
... ... ...
9546 41.022793 28.977392
9547 41.009847 29.041297
9548 41.055817 29.034640
9549 41.057979 29.036019
9550 40.984776 29.026016

9551 rows × 2 columns

In [5]:
print(resto_df["Longitude"].isnull().sum())
print(resto_df["Latitude"].isnull().sum())
0
0
In [18]:
# plot the restaurents on the map
fig = px.scatter_mapbox(resto_df,lat='Latitude',  lon='Longitude',
     hover_name='Restaurant Name',       color_discrete_sequence=['green'],
     zoom=2,
)
fig.update_layout(
    mapbox_style="open-street-map",
)
  • Identify any patterns or clusters of restaurants in specific areas.
In [19]:
X=resto_df[['Latitude','Longitude']]
num_cluster=5
# k mean clustering
kmeans=KMeans(n_clusters=num_cluster,n_init=10,random_state=42)
resto_df['Cluster']=kmeans.fit_predict(X)
In [20]:
# plot on the map
fig=px.scatter_mapbox(resto_df,lat='Latitude',lon='Longitude',
    hover_name='Restaurant Name', hover_data=['Cuisines','Country Code'],
    color='Cluster',    color_continuous_scale='reds',
    zoom=2,
)
fig.update_layout(
    mapbox_style="open-street-map",
)
In [21]:
# Plotting the clusters
plt.scatter(resto_df['Longitude'], resto_df['Latitude'], c=resto_df['Cluster'], cmap='rainbow')
plt.title('Restaurant Clusters')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.show()
No description has been provided for this image

Task 4: Restaurant Chains¶

  • Identify if there are any restaurant chains present in the dataset
In [22]:
resto_df.head(2)
Out[22]:
Restaurant ID Restaurant Name Country Code City Address Locality Locality Verbose Longitude Latitude Cuisines ... Has Table booking Has Online delivery Is delivering now Switch to order menu Price range Aggregate rating Rating color Rating text Votes Cluster
0 6317637 Le Petit Souffle 162 Makati City Third Floor, Century City Mall, Kalayaan Avenu... Century City Mall, Poblacion, Makati City Century City Mall, Poblacion, Makati City, Mak... 121.027535 14.565443 French, Japanese, Desserts ... Yes No No No 3 4.8 Dark Green Excellent 314 0
1 6304287 Izakaya Kikufuji 162 Makati City Little Tokyo, 2277 Chino Roces Avenue, Legaspi... Little Tokyo, Legaspi Village, Makati City Little Tokyo, Legaspi Village, Makati City, Ma... 121.014101 14.553708 Japanese ... Yes No No No 3 4.5 Dark Green Excellent 591 0

2 rows × 22 columns

In [23]:
res_count=resto_df['Restaurant Name'].value_counts()
potential_chains=res_count[res_count > 10].index
print("Potential restaurant chains:")
for chain in potential_chains:
    print(f"--{chain}")
Potential restaurant chains:
--Cafe Coffee Day
--Domino's Pizza
--Subway
--Green Chick Chop
--McDonald's
--Keventers
--Pizza Hut
--Giani
--Baskin Robbins
--Barbeque Nation
--Giani's
--Barista
--Dunkin' Donuts
--Costa Coffee
--Pind Balluchi
--Wah Ji Wah
--Twenty Four Seven
--Pizza Hut Delivery
--Sagar Ratna
--Republic of Chicken
--KFC
--Starbucks
--Chaayos
--Burger King
--Haldiram's
--Shree Rathnam
--Frontier
--Moti Mahal Delux
--Bikanervala
--Aggarwal Sweets
--Behrouz Biryani
--Karim's
--Bikaner Sweets
--Chicago Pizza
--Apni Rasoi
--34, Chowringhee Lane
--Wow! Momo
--Madras Cafe
--Burger Point
  • Analyze the ratings and popularity of different restaurant chains.
In [24]:
restaurant_chain_stats=resto_df.groupby('Restaurant Name').agg({
    'Aggregate rating':'mean',
    'Votes':'sum',
}).reset_index()

restaurant_chain_stats.columns=['Restaurant Name','Average rating','Total Votes']
restaurant_chain_stats=restaurant_chain_stats.sort_values(by='Total Votes',ascending=False)
print("Restaurant Chain Rating and Popularity Analysis (Sorted by Total Votes):")
print(restaurant_chain_stats.head(20))
Restaurant Chain Rating and Popularity Analysis (Sorted by Total Votes):
                Restaurant Name  Average rating  Total Votes
663             Barbeque Nation        4.353846        28142
101   AB's - Absolute Barbecues        4.825000        13400
6943                       Toit        4.800000        10934
785                   Big Chill        4.475000        10853
2297                 Farzi Cafe        4.366667        10098
6988                   Truffles        3.950000         9682
1510                    Chili's        4.580000         8156
2879           Hauz Khas Social        4.300000         7931
3261               Joey's Pizza        4.250000         7807
4902                  Peter Cat        4.300000         7574
796             Big Yellow Door        4.266667         7511
5571            Saravana Bhavan        4.133333         7238
6080                  Starbucks        3.805556         7139
4941           Pirates of Grill        4.025000         7091
3405                    Karim's        3.030769         6878
2098             Domino's Pizza        2.740506         6643
6106                     Subway        2.907937         6124
2145             Dunkin' Donuts        3.136364         5974
783                 Big Brewsky        4.500000         5705
4924              Pind Balluchi        2.630000         5582

Observations

  • Restaurant Chain Rating and Popularity Analysis (Sorted by Total Votes)
    • Barbeque Nation
    • AB's - Absolute Barbecues
    • Toit
    • Big Chill
    • Farzi Cafe